XPAPER.TEX[TEX,DEK] - www.SailDart.org

perm filename XPAPER.TEX[TEX,DEK] blob sn#841366 filedate 1987-06-14 generic text, type T, neo UTF8
% exercises for TeX: The Program

\hsize=6.5in \vsize=8.75in % TUGboat (cleared by BB, Jan 87)
\font\tenbxsl=cmbxsl10 % font for slanted copy in title
\font\eightrm=cmr8
\font\logo=logo10

% the following macros were used to make all the class handouts
\def\\#1{\hbox{\it#1\/\kern.05em}} % italic type for identifiers
\font\tentex=cmtex10 % TeX extended character set (used in strings)
\hyphenchar\tentt=-1 \hyphenchar\tentex=-1
\def\.#1{\hbox{\tentex % typewriter type for strings
  \let\'=\RQ % right quote in a string
  \let\`=\LQ % left quote in a string
  #1}}
\def\LQ{{\tt\char'22}} % left quote in a string
\def\RQ{{\tt\char'23}} % right quote in a string
\def\O#1{\hbox{\rm\char'23\kern-.2em\it#1\/\kern.05em}} % octal constant

\def\begintt{$$\ttverbatim \catcode`\|=0 \parskip=0pt \ttfinish}
\chardef\other=12
\def\ttverbatim{\begingroup
  \catcode`\\=\other \catcode`\{=\other \catcode`\}=\other
  \catcode`\$=\other \catcode`\&=\other \catcode`\#=\other
  \catcode`\%=\other \catcode`\~=\other \catcode`\_=\other
  \catcode`\↑=\other
  \obeyspaces \obeylines \tt}
{\obeyspaces\gdef {\ }}
{\catcode`\|=0 |catcode`|\=\other % | is temporary escape character
  |obeylines % end of line is active
  |gdef|ttfinish#1↑↑M#2\endtt{#1|vbox{#2}|endgroup$$}}
\def\|{{\tt\char`\|}}
\catcode`\|=\active \def|{\ttverbatim\let|=\endgroup}

\def\prob#1. {\medbreak\noindent\hbox to20pt{\bf #1.\hfil}}

\leftline{\bf Exercises for \tenbxsl \TeX: The Program}
\medskip
\leftline{\indent Donald E. Knuth}
\bigskip
\noindent
During the spring of 1987 I taught a course for which Volume~B of
{\sl Computers \& Typesetting\/} was the textbook. Since that book was
meant to serve primarily as a reference, not as a text, I needed
to supplement it with homework exercises and exam problems.

The problems turned out to rather interesting, and I think they might
be useful for self-study if anybody wants to learn {\sl \TeX: The
Program\/} without taking a college course.
Therefore I've collected them here and given what I think are
the correct answers.

The final problem, which deals with the typesetting
of languages that have large character sets, is especially noteworthy since
it presents an extension of \TeX\ that might prove to be useful in Asia.

Some of the problems suggested changes in the text. I've changed my original
wording of the problem statements so that they will make sense when the
next printing of the book comes out; people who have the first edition
should check the published errata before looking too closely at the questions
below.

\medbreak
\noindent{\bf The Problems}
\nobreak\smallskip\noindent
Here, then, are the exercises in the order I gave them. Although they begin
with a rather ``gentle introduction,''
I recommend that the first ones not be skipped, even if they may appear
too easy; there often is a slightly subtle point involved. Conversely,
some of the problems are real stumpers, but they are intended to teach
important lessons. A serious attempt should be made to solve each one
before turning to the answer, if the maximum benefit is to be achieved.

\prob 1. (An exercise about reading a |WEB|.) In the Pascal program defined
by the book, what immediately precedes `|PROCEDURE INITIALIZE|'? (Of
course it's a semicolon, but you should also figure out a few things that
occur immediately before that semicolon.)

\prob 2. Find an unnecessary macro in \S15.

\prob 3. Suppose that you want to make \TeX\ work in an environment where the
input file can contain two-character sequences of the form
`\\{esc}~$x$', where \\{esc}
is ASCII character number \O{33} and where $x$ is an ASCII character between
\.{\'@\'} and \.{\'\char`\_\'} inclusive. The result should be essentially
equivalent to what would have happened if the single (possibly nonprinting)
ASCII character $\\{chr}(\\{ord}(x)-\O{100})$ had been input instead.
If \\{esc} appears without being followed by an~$x$ in the desired range,
you should treat it as if the \\{esc} were ASCII character number \O{177}.

For example, the line `|A| \\{esc} |A| \\{esc} |a| \\{eoln}' should put
four codes into the buffer: \O{101}, \O{001}, \O{177}, \O{141}.

What system-dependent changes will handle this interface requirement?

\prob 4. Suppose that the string at the beginning of the \\{print\_roman\_int}
procedure were |"m2d5c2l2q5v5i"| instead of |"m2d5c2l5x2v5i"|. What
would printed from the input 69? From the input 9999?

\prob 5. Why does \\{error\_count} have a lower bound of~$-1$?

\prob 6. What is printed on the user's terminal after `|q|' is typed in
response to an error prompt? Why?

\prob 7. Give examples of how \TeX\ might fail in the following circumstances:
\smallskip\itemitem{a)}If the test `$t\le7230584$' were eliminated from \S108.
\smallskip\itemitem{b)}If the test `$s\ge1663497$' were eliminated from \S108.
\smallskip\itemitem{c)}If the test `$r>p+1$' were changed to `$r>p$' in \S127.
\smallskip\itemitem{d)}If the test `$\\{rlink}(p)\ne p$' were
	eliminated from \S127.
\smallskip\itemitem{e)}If the test `$\\{lo\_mem\_max}+2\le\\{mem\_bot}+
	\\{max\_halfword}$' were eliminated from \S125.

\prob 8. The purpose of this problem is to figure out what data in \\{mem}
could have generated the following output of \\{show\_node\_list}:
$$\vbox{\halign{#\hfil\qquad&\sevenrm\hfil#\cr
|\hbox(10.0+0.0)x100.0, glue set 10.0fill|&100\cr
|.\discretionary replacing 1|&200\cr
|..\kern 10.0|&300\cr
|.|\||\large U|&10000\cr
|.|\||\large ↑↑K (ligature ff)|&400\rlap{,\thinspace10001,\thinspace10002}\cr
|.\large !|&10003\cr
|.\penalty 5000|&500\cr
|.\glue 0.0 plus 1.0fill|&600\cr
|.\vbox(5.0x0.5)x10.0, shifted -5.0|&700\cr
|..\hbox(5.0x0.0)x10.0|&800\cr
|...\small d|&10004\cr
|...\small a|&10005\cr
|..\rule(0.5+0.0)x*|&900\cr}}$$
Assume that |\large| is font number~1 and that |\small| is font number~2. Also
assume that the nodes used in the lower (variable-size) part of \\{mem}
start in locations 100, 200, etc., as shown; the nodes used in the upper
(one-word) part of \\{mem} should appear in locations 10000, 10001, etc.
Make a diagram that illustrates the exact numeric contents of every
relevant \\{mem} word.

\prob 9. What will \\{short\_display} print, when given the horizontal
list inside the larger |\hbox| in the previous problem, assuming that
\\{font\_in\_short\_display} is initially zero?

\prob 10. Suppose the following commands are executed immediately
after \TeX\ has initialized itself:
$$\hbox{\\{incr}(\\{prev\_depth}); \
\\{decr}(\\{mode\_line}); \
\\{incr}(\\{prev\_graf}); \
\\{show\_activities}.}$$
What will be shown?

\prob 11. What will `\\{show\_eqtb}($\\{int\_base}+17$)' show,
after \TeX\ has initialized itself?

\prob 12. Suppose \TeX\ has been given the following definitions:
\begintt
\def\a{\advance\day by 1\relax} \def\g{\global\a}
\endtt
The effect of this inside \TeX\ will be that an appearance of\/ |\a| calls
$$\\{eq\_word\_define}(p,\\{eqtb}[p].\\{int}+1),$$ and an appearance of\/
|\g| calls
$\\{geq\_word\_define}(p,\\{eqtb}[p].\\{int}+1)$, where
$p=\\{int\_base}+\\{day\_code}$. Consider now the following commands:
\begintt
\day=0 \g\a{\a\g\a{\g\a\g}\a{\a}\a}
\endtt
Each `|{|' calls \\{new\_save\_level}(\\{simple\_group}), and each
`|}|' calls \\{unsave}.

Explain what gets pushed onto and popped off of the \\{save\_stack}, and what
gets stored in $\\{eqtb}[p]$ and $\\{xeq\_level}[p]$, as the above commands
are executed. What is the final value of\/ |\day|? \ (See {\sl The \TeX book},
exercise 15.9 and page 301.)

\prob 13. Use the notation at the bottom of page 122 in
{\sl \TeX: The Program\/} to describe
the contents of the token list corresponding to |\!| after the
definition
\begintt
\def\!!1#2![{!#]#!!2}
\endtt
has been given, assuming that
|[|, |]|, and |!| have the respective catcodes 1, 2, and~6, just as
|{|, |}|, and |#| do. \ (See exercise 20.7 in {\sl The \TeX book}.)

\prob 14. What is the absolute maximum number of characters that
will printed by \\{show\_eqtb}(\\{every\_par\_loc}), if the current value of
|\everypar| does not contain any control sequences? (Hint: The answer
exceeds~40. You may wish to verify this by running \TeX, defining an
appropriate worst-case example, and saying 
\begintt
\tracingrestores=1 \tracingonline=1 {\everypar{}}
\endtt
since this will invoke \\{show\_eqtb} when |\everypar| is restored.)

\prob 15. What does {\tt INITEX} do with the following input line? (Look closely.)
\begintt
\catcode``=7 \`` `(``)```!
\endtt

\prob 16. Explain the error message you get if you say
\begintt
\endlinechar=`! \error
\endtt
in plain \TeX.

\prob 17. Fill in the missing macro definition so that the program
\begintt
\catcode`?=\active
\def\answer{...}
\answer
\endtt
will produce precisely the following error message when run with plain \TeX:
\begintt
! Undefined control sequence.
<recently read> How did this happen?
|noindent|null
l.3 \answer
|noindent|null
? 
\endtt
(This problem is much harder than the others above, but there are
at least three ways to solve it!)

\prob 18. Consider what \TeX\ will do when it processes the following text:
\begintt
{\def\t{\gdef\a##}\catcode`d=12\t1d#2#3{#2}}
\hfuzz=100P\ifdim12pt=1P\expandafter\a
 \expandafter\else\romannumeral888\relax\fi
\showthe\hfuzz \showlists
\endtt
(Assume that the category codes of plain \TeX\ are being used.)

Determine when the subroutines \\{scan\_keyword}, \\{scan\_int}, and
\\{scan\_dimen} are called as this text is being read, and explain
in general terms what results those subroutines produce.

\prob 19. What is the difference in interpretation, if any, between
the following two \TeX\ commands?
\begintt
\thickmuskip=-\thickmuskip
\thickmuskip=-\the\thickmuskip
\endtt
(Assume that plain \TeX\ is being used.) Explain why there is or isn't a
difference.

\prob 20. In what way would \TeX's behavior change if the
assignment at the end of \S508 were changed to `$b\gets(p=\\{null})$'\thinspace?

\prob 21. The initial implementation of \TeX82 had a much simpler
procedure in place of the one now in \S601:
$$\vbox{\halign{#\hfil\cr
{\bf procedure} \\{dvi\_pop};\cr
\quad{\bf begin if} $\\{dvi\_ptr}>0$ {\bf then}\cr
\qquad{\bf if} $\\{dvi\_buf}[\\{dvi\_ptr}-1]=\\{push}$ {\bf then}
	\\{decr}(\\{dvi\_ptr})\cr
\qquad{\bf else} \\{dvi\_out}(\\{pop})\cr
\quad{\bf else} \\{dvi\_out}(\\{pop});\cr
\quad{\bf end};\cr}}$$
(No parameter $l$ was necessary.) Why did the author hang his head in shame
one day and change it to the form it now has?

\prob 22. Assign subscripts $d$, $y$, and $z$ to the sequence
of integers
$$2\,7\,1\,8\,2\,8\,1\,8\,2\,8\,4\,5\,9\,0\,4\,5$$
using the procedure sketched in \S604. (This is easy.)

\prob 23. Find a short \TeX\ program that will cause the
\\{print\_mode} subroutine to print `{\tt no mode}'.
(Do not assume that the category codes or macros of plain \TeX\ have
been preloaded.) Extra credit will be given to the person who has
the shortest program, i.e., the fewest tokens, among all correct
solutions submitted.

\prob 24. The textbook says in \S78 that \\{error} might be called within
\\{error} within a call of \\{error}, but the recursion cannot go any
deeper than this.

Construct a scenario in which \\{error} is entered three times
before it has been completed.

\prob 25. (The following question was the main problem on the midterm exam.)
Suppose {\tt WEB}'s conventions have been changed so that strings are
not identified by their number but rather by their starting position in
the \\{str\_pool} array. The \\{str\_start} array is therefore eliminated.

Strings of length~1 are still represented by their ASCII code values;
all other strings have values $\ge128$, and they appear in
the \\{pool\_file} just as before, in increasing order of starting position.
The special code `128' is assumed to terminate each string.

Thus, for example, if such a {\tt WEB} program uses just the three strings
{\tt"ab"}, {\tt""}, and {\tt"cd"} (in this order),
they will be represented in the corresponding Pascal code by the respective
integers 128, 131, and 132. The program in this case is expected to initialize
\\{str\_pool} locations 128--134 to the successive code values
97, 98, 128, 128, 99, 100, 128.

Such a modification requires lots of changes to \TeX.
Your job in this problem is to indicate what those changes should be.
However, you needn't specify a complete change file; just say how you would
modify \S38--\S48, \S59, \S259, \S407, \S464, and \S602 (if these sections
need to change at all). The other places where \\{str\_start} appears
can be changed in similar ways, and you needn't deal with those.

Some of the specified sections will require new code; you should supply
that code. Other sections may change only a little bit or not at all;
you should just give the grader sufficient explanation of what should
happen there.

\prob 26. Continuing problem 25,
discuss briefly whether or not it would be preferable
(a)~to store the length of
each string just before the first character, instead of using `128' just after
the last character; or (b)~to eliminate the extra `128' entirely and to save
space by adding 128 to the final character.

\prob 27. J. H. Quick (a student) thought he spotted a bug
in \S671 and he was all set to collect \$40.96 because of programs
like this:
\begintt
\vbox{\moveright 1pt\hbox to 2pt{}
  \xleaders\lastbox\vskip 3pt}
\endtt
(He noticed that \TeX\ would give this vbox a width of 2\thinspace pt,
and he thought that the correct width was 3\thinspace pt.) However,
when he typed |\showlists| he saw that the leaders were simply
\begintt
\xleaders 3.0
.\hbox(0.0+0.0)x2.0
\endtt
and he noticed with regret the statement `$\\{shift\_amount}(\\{cur\_box})
\gets0$' in \S1081.

Explain how \S671 would have to be corrected, if
the \\{shift\_amount} of a leader box could be nonzero.

\eject % otherwise the page number is embarrassingly non-centered!
\begingroup
\hsize=3.5in
%\hbadness=-1
\prob 28. When your instructor made up this problem, he said
`|\hbadness=-1|' so that \TeX\ would print out the way each line of
this paragraph was broken. (He sometimes wants to check line breaks
without looking at actual output, when he's using a terminal that
has no display capabilities.) It turned out that \TeX\ typed this:

\endgroup\begingroup\vskip-6pt
\begintt
Loose \hbox (badness 0) in paragraph at lines 11--16
[]\tenrm When your instructor made up this problem, he said
|medskip
Tight \hbox (badness 3) in paragraph at lines 11--16
\tenrm `\tentt \hbadness=-1\tenrm ' so that T[]X would print out the way each
|medskip
Tight \hbox (badness 20) in paragraph at lines 11--16
\tenrm line of this paragraph was broken. (He sometimes wants to
|medskip
Loose \hbox (badness 1) in paragraph at lines 11--16
\tenrm check line breaks without looking at actual output, when
|medskip
Loose \hbox (badness 1) in paragraph at lines 11--16
\tenrm he's using a terminal that has no display capabilities.) It
\endtt
Why wasn't anything shown for the last line of the paragraph?

\endgroup

\prob 29. How would the output of \TeX\ look different if the \\{rebox}
procedure were changed by deleting the statement
`{\bf if} $\\{type}(b)=\\{vlist\_node}$ {\bf then} $b\gets\\{hpack}(b,
	 \\{natural})$'?
How would the output look different if the next conditional statement,
`{\bf if} $(\\{is\_char\_node}(p))$ \dots' were deleted?
(Note that box~$b$ might have been formed by \\{char\_box}.)

\prob 30. What spacing does \TeX\ insert between the characters
when it typesets the formulas |$x==1$|, |$x++1$|, and |$x,,1$|?
Find the places in the program where these spacing decisions are made.

\begingroup
\hsize=3.18in
\tracingparagraphs=1 \pretolerance=-1
\prob 31. When your instructor made up this problem, he said
`|\tracingparagraphs=1|' so that his transcript file would explain why
\TeX\ has broken the paragraph into lines in a particular way. He also said
`|\pretolerance=-1|' so that hyphenation would be tried immediately.
The output is shown on the next page; use it to determine
what line breaks would have been found by a simpler algorithm
that breaks one line at a time. (The simpler algorithm finds the
breakpoint that yields fewest demerits on the first line,
then chooses it and starts over again.)

\endgroup

\pageinsert 
\font\ninett=cmtt9
\vbox to\vsize{\parindent=0pt\obeylines\let\tt=\ninett
\baselineskip=10pt plus 1pt
|% This is the paragraph-trace output referred to in Problem 30:|
|[]\tenrm When your in-struc-tor made up this prob-lem, he |
|@ via @@0 b=0 p=0 d=100|
|@@1: line 1.2 t=100 -> @@0|
|said `\tentt \tracingparagraphs=1\tenrm ' so that his tran-script |
|@ via @@1 b=4 p=0 d=196|
|@@2: line 2.2 t=296 -> @@1|
|file would ex-plain why T[]X has bro-ken the para-|
|@\discretionary via @@2 b=175 p=50 d=46725|
|@@3: line 3.0- t=47021 -> @@2|
|graph |
|@ via @@2 b=25 p=0 d=1225|
|@@4: line 3.3 t=1521 -> @@2|
|into lines in a par-tic-u-lar way. He also said |
|@ via @@3 b=69 p=0 d=6241|
|@@5: line 4.1 t=53262 -> @@3|
|`\tentt \pretolerance=-1\tenrm ' so that hy-phen-ation would be |
|@ via @@5 b=43 p=0 d=2809|
|@@6: line 5.1 t=56071 -> @@5|
|tried im-me-di-ately. The out-put is shown on the next |
|@ via @@6 b=0 p=0 d=100|
|@@7: line 6.2 t=56171 -> @@6|
|page; use it to de-ter-mine what line breaks would |
|@ via @@7 b=153 p=0 d=36569|
|@@8: line 7.0 t=92740 -> @@7|
|have |
|@ via @@7 b=34 p=0 d=1936|
|@@9: line 7.3 t=58107 -> @@7|
|been found by a sim-pler al-go-rithm that breaks |
|@ via @@8 b=1 p=0 d=10121|
|@@10: line 8.2 t=102861 -> @@8|
|one |
|@ via @@9 b=15 p=0 d=10625|
|@@11: line 8.1 t=68732 -> @@9|
|line at a time. (The sim-pler al-go-rithm finds |
|@ via @@10 b=164 p=0 d=40276|
|@@12: line 9.0 t=143137 -> @@10|
|the |
|@ via @@10 b=0 p=0 d=100|
|@ via @@11 b=192 p=0 d=40804|
|@@13: line 9.0 t=109536 -> @@11|
|@@14: line 9.2 t=102961 -> @@10|
|break-point that yields fewest de-mer-its on the |
|@ via @@12 b=174 p=0 d=33856|
|@@15: line 10.0 t=176993 -> @@12|
|first |
|@ via @@12 b=41 p=0 d=12601|
|@ via @@13 b=75 p=0 d=7225|
|@ via @@14 b=75 p=0 d=7225|
|@@16: line 10.1 t=110186 -> @@14|
|line, then chooses it and starts over again.) |
|@\par via @@15 b=0 p=-10000 d=10100|
|@\par via @@16 b=0 p=-10000 d=100|
|@@17: line 11.2- t=110286 -> @@16|
}
\endinsert

\prob 32. Play through the algorithms in parts 42 and 43, to figure
out the contents of \\{trie\_op}, \\{trie\_char}, \\{trie\_link},
\\{hyf\_distance}, \\{hyf\_num}, and \\{hyf\_next} after the statement
\begintt
\patterns{a1bc 2bcd3 ab1cd}
\endtt
has been processed. Then execute the algorithm of \S923, to see
how \TeX\ uses this efficient trie structure
to set the values of \\{hyf} when the word |aabcd| is hyphenated.
[The value of \\{hn} will be~5, and the values of \\{hc}[$1\,.\,.\,5$]
will be $(96,96,97,98,99)$, respectively, when \S923 begins.]

\prob 33. The \\{save\_stack} is normally empty when
a \TeX\ program stops. But if,
say, the user's input has an extra `|{|' (or a missing `|}|'), \TeX\
will print the warning message
\begintt
(\end occurred inside a group at level 1)
\endtt
(see \S1335).
\goodbreak

Explain in detail how to change \TeX\ so that such warning messages will be
more explicit. For example, if the source program has an unmatched `|{|'
on line~6 and an unmatched `|\begingroup|' on line~25, your modified \TeX\
should give two warnings:
\begintt
(\end occurred when \begingroup on line 25 was incomplete)
(\end occurred when { on line 6 was incomplete)
\endtt
You may assume that \\{simple\_group} and \\{semi\_simple\_group} are the
only group codes present on \\{save\_stack} when \S1335 is encountered;
if other group codes are present, your program should call \\{confusion}.

\prob 34. (The following question is the most difficult yet most important
of the entire collection. It was the main problem on the take-home final exam.)

\def\TeXX{\TeX\kern-.3emX}
The purpose of this problem is to extend \TeX\ so that it will sell better
in China and Japan. The extended program, called \TeXX, allows each font
to contain up to 65536 characters. Each extended character is represented
by two values, its `extension'~$x$ and its `code'~$c$, where both $x$ and~$c$
lie between 0 and~255 inclusive. Characters with the same `$c$' but different
`$x$' correspond to different graphics; but they have the same width, height,
depth, and italic correction.

\TeXX\ is identical to \TeX\ except that it has one new primitive command:
|\xchar|. If\/ |\xchar| occurs in vertical mode, it begins a new paragraph;
i.e., it's a $\langle$horizontal command$\rangle$ as on p.~283 of {\sl The
\TeX book}. If\/ |\xchar| occurs in horizontal mode it should be followed
by a $\langle$number$\rangle$ between 0 and 65535; this number can be converted
to the form $256x+c$, where $0\le x,c<256$. The corresponding extended character
from the current font will be appended to the current horizontal list, and the
space factor will be set to 1000.
(If $x=0$, the effect of\/ |\xchar| is something like the effect of\/ |\char|,
except that |\xchar| disables ligatures and kerns and it doesn't do anything
special to the space factor. Moreover, no penalty is inserted
after an |\xchar| that happens to be the |\hyphenchar| of the current font.)
A word containing an extended character will not be hyphenated.
The |\xchar| command should not occur in math mode.

Inside \TeXX, an extended character $(x,c)$ in font~$f$ is represented by
two consecutive \\{char\_node} items $p$ and~$q$, where we have
$\\{font}(p)=\\{null\_font}$, $\\{character}(p)=\\{qi}(x)$, $\\{link}(p)=q$,
$\\{font}(q)=f$, and $\\{character}(q)=\\{qi}(c)$. This two-word representation
is used even when $x=0$.

\TeXX\ typesets an extended character by specifying character number $256x+c$
in the |DVI| file. (See the \\{set2} command in \S585.)

If \TeXX\ is run with the macros of plain \TeX, and if the user types
`|\tracingall| |\xchar600| |\showlists|', the output of \TeXX\ will
include
\begintt
{\xchar}
{horizontal mode: \xchar}
{\showlists}
|noindent|null
### horizontal mode entered at line 0
\hbox(0.0+0.0)x20.0
\tenrm \xchar"258
spacefactor 1000
\endtt
(since 600 is |"258| in hexadecimal notation).

Your job is to explain in detail {\it all\/} changes to \TeX\ that are
necessary to convert it to \TeXX.

[Note: A properly designed extension would also include the primitive operator
|\xchardef|, analogous to |\chardef| and |\mathdef|, because a language
should be `orthogonally complete'.
However, this additional extension has not been included as part
of problem~34, because it presents no special difficulties. Anybody who
can figure out how to implement |\xchar| can certainly also handle |\xchardef|.]

\prob 35. The first edition of {\sl \TeX: The Program\/} suggested that
extended characters could be represented with the following convention:
The first of two consecutive \\{char\_node} items was to contain the font
code and a character code from which the dimensions could be computed
as usual; the second \\{char\_node} was a \\{halfword} giving the actual
character number to be typeset. Fonts were divided into two types,
based on characteristics of their |TFM| headers; `oriental' fonts always
used this two-word representation, other fonts always used the one-word
representation.

Explain why the method suggested in problem 34 is better than this.
(There are at least two reasons.)

% macros copied from WEBMAC will be used for the rest of this paper!
\let\sec=\S
\let\!=\|
\begingroup
\catcode`\@=\other
\catcode`\"=\other
\parskip 0pt % no stretch between paragraphs

\def\\#1{\hbox{\it#1\/\kern.05em}} % italic type for identifiers
\def\|#1{\hbox{$#1$}} % one-letter identifiers look a bit better this way
\def\{\hbox{\bf#1\/}} % boldface type for reserved words
\def\.#1{\hbox{\tentex % typewriter type for strings
  \let\\=\BS % backslash in a string
  \let\'=\RQ % right quote in a string
  \let\`=\LQ % left quote in a string
  \let\{=\LB % left brace in a string
  \let\}=\RB % right brace in a string
  \let\~=\TL % tilde in a string
  \let\ =\SP % space in a string
  \let\_=\UL % underline in a string
  \let\&=\AM % ampersand in a string
  \let\|=\!% vertical line in a string
  #1}}
\def\#{\hbox{\tt\char`\#}} % parameter sign
\def\${\hbox{\tt\char`\$}} % dollar sign
\def\%{\hbox{\tt\char`\%}} % percent sign
\def\↑{\ifmmode\mathchar"222 \else\char`↑ \fi} % pointer or hat
% circumflex accents can be obtained from \↑↑D instead of \↑

\chardef\AM=`\& % ampersand character in a string
\chardef\BS=`\\ % backslash in a string
\chardef\LB=`\{ % left brace in a string
\def\LQ{{\tt\char'22}} % left quote in a string
\chardef\RB=`\} % right brace in a string
\def\RQ{{\tt\char'23}} % right quote in a string
\def\SP{{\tt\char`\ }} % (visible) space in a string
\chardef\TL=`\~ % tilde in a string
\chardef\UL=`\_ % underline character in a string

\newbox\bak \setbox\bak=\hbox to -1em{} % backspace one em
\newbox\bakk\setbox\bakk=\hbox to -2em{} % backspace two ems

\newcount\ind % current indentation in ems
\def\1{\global\advance\ind by1\hangindent\ind em} % indent one more notch
\def\2{\global\advance\ind by-1} % indent one less notch
\def\3#1{\hfil\penalty#10\hfilneg} % optional break within a statement
\def\4{\copy\bak} % backspace one notch
\def\5{\hfil\penalty-1\hfilneg\kern2.5em\copy\bakk\ignorespaces}% optional break
\def\6{\ifmmode\else\par % forced break
  \hangindent\ind em\noindent\kern\ind em\copy\bakk\ignorespaces\fi}
\def\7{\Y\6} % forced break and a little extra space

\let\yskip=\smallskip
\def\to{\mathrel{.\,.}} % double dot, used only in math mode
\def\note#1#2.{\Y\noindent{\hangindent2em\baselineskip10pt\eightrm#1 #2.\par}}
\def\startsection{\Q\noindent\strut{\bf\modno.\quad}}
\def\defin#1{\global\advance\ind by 2 \1\&{#1 }} % begin `define' or `format'
\def\A{\note{See also}} % cross-reference for multiply defined section names
\def\B{\mathopen{\.{@\{}}} % begin controlled comment
\def\C#1{\ifmmode\gdef\XX{\null$\null}\else\gdef\XX{}\fi % Pascal comments
  \XX\hfil\penalty-1\hfilneg\quad$\{\,$#1$\,\}$\XX}
\def\D{\defin{define}} % macro definition
\def\E{\cdot10↑} % exponent in floating point constant
\def\F{\defin{format}} % format definition
\let\G=\ge % greater than or equal sign
\def\H#1{\hbox{\rm\char"7D\tt#1}} % hexadecimal constant
\let\I=\ne % unequal sign
\def\J{\.{@\&}} % TANGLE's join operation
\let\K=\gets % left arrow
\let\L=\le % less than or equal sign
\outer\def\M#1.{\bigbreak\def\modno{#1}\startsection\ignorespaces\iftrue}
\def\O#1{\hbox{\rm\char'23\kern-.2em\it#1\/\kern.05em}} % octal constant
\def\P{\rightskip=0pt plus 100pt minus 10pt % go into Pascal mode
\leftskip=20pt
 \parindent 1em % for paragraphs and for the first line of Pascal text
  \sfcode`;=3000
  \pretolerance 10000
  \hyphenpenalty 10000 \exhyphenpenalty 10000
  \global\ind=2 \1\ \unskip}
\def\Q{\rightskip=0pt % get out of Pascal mode
 \parindent20pt \leftskip=0pt
  \sfcode`;=1500 \pretolerance 200 \hyphenpenalty 50 \exhyphenpenalty 50 }
\let\R=\lnot % logical not
\let\S=\equiv % equivalence sign
\def\T{\mathclose{\.{@\}}}} % terminate controlled comment
\def\U{\note{This code is used in}} % cross-reference for uses of sections
\let\V=\lor % logical or
\let\W=\land % logical and
\def\X#1:#2\X{\ifmmode\gdef\XX{\null$\null}\else\gdef\XX{}\fi % section name
  \XX$\langle\,$#2{\eightrm\kern.5em#1}$\,\rangle$\XX}
\def\Y{\par\yskip}
\let\Z=\let % now you can \send the control sequence \Z
\def\){\hbox{\.{@\$}}} % sign for string pool check sum
\def\]{\hbox{\.{@\\}}} % sign for forced line break
\def\=#1{\kern2pt\hbox{\vrule\vtop{\vbox{\hrule
        \hbox{\strut\kern2pt\.{#1}\kern2pt}}
      \hrule}\vrule}\kern2pt} % verbatim string
\let\~=\ignorespaces

\def\ellipsis{\kern5em\smash{\vdots}\qquad\vbox to12pt{}\par}
\def\hang{\hangindent 3em\noindent\ignorespaces}

\medbreak
\noindent{\bf The Answers}

\prob 1. According to the index, \\{initialize} is declared in \sec4.
It is preceded there by \X13:Global variables\X, and \sec13 tells us that
the final global variable appears in \sec1345. Turning to \sec1345, we
find `\\{write\_loc}: \\{pointer};' and a comment. The comment doesn't
get into the Pascal code. The mini-index at the bottom of page~535
tells us that `\\{pointer}' is a macro defined in \sec115. Our quest
is nearly over, since \sec115 says that \\{pointer} expands to \\{halfword},
which is part of the Pascal program. Page~ix tells us that lowercase letters
of a |WEB| program become uppercase in the corresponding Pascal code;
page~x tells us that the underline in `\\{write\_loc}' is discarded.
Therefore we conclude that `|PROCEDURE| |INITIALIZE|' is immediately
preceded in the Pascal program by `|WRITELOC:HALFWORD;|'.

But this isn't quite correct! The book doesn't tell the whole
story. If we actually run |TANGLE| on |TEX.WEB| (without a change file),
we find that `|PROCEDURE| |INITIALIZE|' is actually preceded by
\begintt
{1345:}WRITELOC:HALFWORD;{:1345}
\endtt
because |TANGLE| inserts comments to show the origin of each block of code.

\prob 2. The index tells us that \\{done5} and \\{done6} are never used.
(They are included only for people who have to make system-dependent
changes and/or extensions.)

\prob 3. Here we change the \\{input\_ln} procedure of \sec31. One
way is to replace the statements `$\\{buffer}[\\{last}]\K\\{xord}[f\↑]$;
$\\{get}(f)$' by the following:
\Y\P\1\1\qquad
\&{if} $\\{ord}(\|f\↑)=\O{33}$ \1\&{then}\6
\&{begin} \37$\\{get}(\|f)$;\6
\&{if} $(\\{ord}(\|f\↑)\G\.{"@"})\W(\\{ord}(\|f\↑)\L\.{"\_"})$ \1\&{then}\6
\&{begin} \37$\\{buffer}[\\{last}]\K\\{xord}[\\{chr}(\\{ord}(\|f\↑)-\O{100})]$;%
\5
$\\{get}(\|f)$;\6
\&{end}\6
\4\&{else} $\\{buffer}[\\{last}]\K\\{invalid\_code}$;\2\6
\&{end}\6
\4\&{else} \&{begin} \37$\\{buffer}[\\{last}]\K\\{xord}[\|f\↑]$;\5
$\\{get}(\|f)$;\6
\&{end};\par\Q

\prob 4. The new string essentially substitutes ``quarters''~|q| (of
value~25) for ``dimes''~|x| (of value~10). Playing through the code of\/ \sec69
tells us that 69 is now represented by |lvvviv| and 9999 is |mmmmmmmmmcmqcvqiv|.
(The first nine |m|'s make 9000; then |cm| makes 900; then |qc| makes~75;
then |vq| makes~20; and |iv| makes the remaining~4.)

\prob 5. Because it may be decreased by~1 in \sec1293 before being increased
by~1 in~\sec82.
(The code in \sec1293 decreases \\{error\_count} because ``showing'' uses the
\\{error} subroutine although it isn't really an error.)

\prob 6. The |q| becomes |Q| in \sec83. This causes \sec86 to print
`|OK, entering \batchmode|', after which \\{selector} is decreased so
that `|...|' and $\langle$return$\rangle$ are {\it not\/} printed
on the terminal! (They appear only in the log file, if it has been opened.)
This is \TeX's way of confirming that |\batchmode| has indeed been entered.

\prob 7. (a)~Arithmetic overflow might occur when computing $t\ast297$,
because $7230585\times297=2↑{31}+97$. (b)~Some sort of test is need to
avoid division by zero when $0<s<297$. If $s<1663497$ then $s$~{\bf div}~$297<
5601$, and $7230585/5600$ is a bit larger than 1291 so we will have
$r>1290$ in such a case. The threshold value has therefore been chosen
to save division whenever possible. (One student suggested that the
statement `$r\K t$' be replaced by `$r\K1291$'. That might or might not
be faster, depending on the computer and the Pascal compiler. In machine
language one would `{\bf goto}' the statement that sets $\\{badness}\K
\\{inf\_bad}$, but that~is~inadmissible Pascal.)
(c)~If we get to \sec128 with $r=p+1$, we will try to make a node of size~1,
but~then there's no room for the \\{node\_size} field. (d)~If we get to
\sec129 with only one node available, we'll lose everything and \\{rover}
will be invalid. (Older versions of \TeX\ have a more complicated test
in \sec127, which would suppress going to \sec129 if there were two nodes
available. That was unnecessarily cautious.) (e)~This is a subtle one.
The lower part of memory must not be allowed to grow so large that
a \\{node\_size} value could ever exceed \\{max\_halfword} when nodes
are being merged together in \sec127.

\prob 8. We assume that $\\{min\_quarterword}=\\{min\_halfword}=0$.
$$\def\|#1{\tt\hfil#1 } % copy inside a box
\let\,=\thinspace
\setbox\strutbox=\hbox{\vrule height9pt depth3pt width0pt}
\def\g{\hfil\ignorespaces} % for floating point
\def\qqh#1#2#3{\vtop{\hbox{%
	\hbox to 30pt{\|{#1}\kern-.2pt\vrule\kern-.2pt\strut}%
	\hbox to 30pt{\|{#2}\kern-.2pt\vrule\kern-.2pt\strut}%
	\hbox to 60pt{\|{#3}}}\kern-.2pt\hrule\kern-.2pt}}
\def\hh#1#2{\vtop{\hbox{%
	\hbox to 60pt{\|{#1}\kern-.2pt\vrule\kern-.2pt\strut}%
	\hbox to 60pt{\|{#2}}}\kern-.2pt\hrule\kern-.2pt}}
\def\w#1{\vtop{\hbox{\strut
	\hbox to 120pt{\|{#1}}}\kern-.2pt\hrule\kern-.2pt}}
\def\topline{\noalign{\vskip-\baselineskip}
 &\omit&\w{}\cr}
\halign{&\strut\kern20pt\tt#\,\hfil&\vrule#\kern-.2pt&
 #&\kern-.2pt\vrule#&\quad#\hfil\cr
\topline
100:&&\qqh000&&\\{type} (\\{hlist\_node}), , \\{link}\cr
101:&&\w{6553600}&&\\{width} (100\,pt)\cr
102:&&\w0&&\\{depth}\cr
103:&&\w{655360}&&\\{height} (10\,pt)\cr
104:&&\w0&&\\{shift\_amount}\cr
105:&&\qqh12{200}&&
 \\{glue\_sign} (\\{stretching}), \\{glue\_order} (\\{fill}), \\{list\_ptr}\cr
106:&&\w{10.0\g}&&\\{glue\_set} (type \\{real})\cr
\noalign{\medskip}\topline
200:&&\qqh71{10003}&&\\{type} (\\{disc\_node}), \\{replace\_count}, \\{link}\cr
201:&&\hh{300}{10000}&&\\{pre\_break}, \\{post\_break}\cr
\noalign{\medskip}\topline
300:&&\qqh{11}10&&\\{type} (\\{kern\_node}), \\{subtype} (\\{explicit}), %
 \\{link}\cr
301:&&\w{655360}&&\\{width} (10\,pt)\cr
\noalign{\medskip}\topline
400:&&\qqh600&&\\{type} (\\{ligature\_node}), , \\{link}\cr
401:&&\qqh1{11}{10001}&&\\{font}, \\{character}, \\{lig\_ptr}\cr
\noalign{\medskip}\topline
500:&&\qqh{12}0{600}&&\\{type} (\\{penalty\_node}), , \\{link}\cr
501:&&\w{5000}&&\\{penalty}\cr
\noalign{\medskip}\topline
600:&&\qqh{10}0{700}&&\\{type} (\\{glue\_node}), \\{subtype} (\\{normal}), \\{link}\cr
601:&&\hh80&&\\{glue\_ptr} (\\{fill\_glue}), \\{leader\_ptr}\cr
\noalign{\medskip}\topline
700:&&\qqh100&&\\{type} (\\{vlist\_node}), , \\{link}\cr
701:&&\w{655360}&&\\{width} (10\,pt)\cr
702:&&\w{32768}&&\\{depth} (0.5\,pt)\cr
703:&&\w{327680}&&\\{height} (5\,pt)\cr
704:&&\w{-327680}&&\\{shift\_amount} ($-5$\,pt)\cr
705:&&\qqh00{800}&&
 \\{glue\_sign} (\\{normal}), \\{glue\_order} (\\{normal}), \\{list\_ptr}\cr
706:&&\w{0.0\g}&&\\{glue\_set} (type \\{real})\cr
\noalign{\medskip}\topline
800:&&\qqh00{900}&&\\{type} (\\{hlist\_node}), , \\{link}\cr
801:&&\w{655360}&&\\{width} (10\,pt)\cr
802:&&\w0&&\\{depth}\cr
803:&&\w{327680}&&\\{height} (5\,pt)\cr
804:&&\w0&&\\{shift\_amount}\cr
805:&&\qqh00{10004}&&
 \\{glue\_sign} (\\{normal}), \\{glue\_order} (\\{normal}), \\{list\_ptr}\cr
806:&&\w{0.0\g}&&\\{glue\_set} (type \\{real})\cr
\noalign{\medskip}\topline
900:&&\qqh200&&\\{type} (\\{rule\_node}), , \\{link}\cr
901:&&\w{-1073741824}&&\\{width} (\\{null\_flag})\cr
902:&&\w0&&\\{depth}\cr
903:&&\w{32768}&&\\{height} (0.5\,pt)\cr
\noalign{\medskip}\topline
\llap{10}000:&&\qqh1{"U"}{400}&&\\{font}, \\{character}, \\{link}\cr
\llap{10}001:&&\qqh1{"f"}{10002}&&\\{font}, \\{character}, \\{link}\cr
\llap{10}002:&&\qqh1{"f"}0&&\\{font}, \\{character}, \\{link}\cr
\llap{10}003:&&\qqh1{"!"}{500}&&\\{font}, \\{character}, \\{link}\cr
\llap{10}004:&&\qqh2{"d"}{10005}&&\\{font}, \\{character}, \\{link}\cr
\llap{10}005:&&\qqh2{"a"}0&&\\{font}, \\{character}, \\{link}\cr
}$$

\prob 9. (Norwegian Americans will recognize this as an `Uff da' joke.) The
output of \\{short\_display} is
\begintt
\large Uff []
\endtt
since \\{short\_display} shows the pre-break and post-break parts of a
discretionary (but not the replacement text). However, if this box were
output by \\{hlist\_out}, the discretionary break would not be effective;
the result would be a box 100$\,$pt wide, beginning with a large `!' and
ending with a small `da', the latter being raised 5$\,$pt and underlined
with a 0.5$\,$pt-rule.

\prob 10. Since \\{prev\_depth} is initially \\{ignore\_depth}, we get
\begintt
### vertical mode entered at line 1 (\output routine)
prevdepth -999.99998, prevgraf 1 line
\endtt

\prob 11. According to \sec236, $\\{int\_base}+17$ is where \\{mag} is
stored. (One of the definitions suppressed by an ellipsis on page 101 is
\\{mag}; you can verify this by checking the index!) The initial value
of \\{mag} is set in \sec240. Hence \\{show\_eqtb} branches to \sec242 and
prints `|\mag=1000|'.

\begingroup
\def\|#1{{\sevenrm(#1)}}
\prob 12. In the following chart, `\|3' means a value at level three, and
`---' is a level boundary:
$$\advance\abovedisplayskip-6pt
\def\-{---} \chardef\{=`\{ \chardef\}=`\}
\def\:{\omit\tt\quad\char`}
\halign{\kern15pt\hfil#&&\hfil\hbox to0pt{\hss#\hss}\cr
&&&&&&&&&&&&&&&\|2\cr
&&&&&&&&&&&&&&&9\cr
&&&&&&&&&&\|1&\|1&&&\-&\-\cr
&&&&&&&&&&6&6&&\|1&\|1&\|1&\|1&\|1\cr
&&&&&&&&\-&\-&\-&\-&&8&8&8&8&8\cr
&&&&&&&\|1&\|1&\|1&\|1&\|1&\|1&\|1&\|1&\|1&\|1&\|1\cr
&&&&&&&4&4&4&4&4&4&4&4&4&4&4\cr
&&&&&\|1&\|1&\|1&\|1&\|1&\|1&\|1&\|1&\|1&\|1&\|1&\|1&\|1\cr
&&&&&2&2&2&2&2&2&2&2&2&2&2&2&2\cr
\\{save\_stack}:&&&&\-&\-&\-&\-&\-&\-&\-&\-&\-&\-&\-&\-&\-&\-\cr
\\{xeq\_level}$[p]$:&\|1&\|1&\|1&\|1&\|2&\|1&\|2&\|2&\|1&\|3&\|1&\|1&\|2&\|2&\|3&\|2&\|2&\|1\cr
\\{eqtb}$[p]$.\\{int}:&0&1&2&2&3&4&5&5&6&7&8&8&9&9&10&9&10&8\cr
operations:&\:\\day=0&\:\\g&\:\\a&\:\{\ &\:\\a&\:\\g&\:\\a&\:\{\ &
 \:\\g&\:\\a&\:\\g&\:\}\ &\:\\a&\:\{\ &\:\\a&\:\}\ &\:\\a&\:\}\ \cr}$$
The final value is therefore |\day=8|.
\endgroup

{\def\,{\hskip2pt}
\prob 13. (reference count), \\{match}\,|!|, \\{match}\,|#|, \\{left\_brace}\,|[|,
\\{end\_match}, \\{left\_brace}\,|{|, \\{mac\_param}\,|#|, \\{right\_brace}\,|]|,
\\{mac\_param}\,|!|, \\{out\_param}\,2, \\{left\_brace}\,|[|. Notice that the
\\{left\_brace} before the \\{end\_match} is repeated at the end of the replacement
text, because it has been matched (and therefore removed from the input).}

\prob 14. According to \sec233, \\{show\_eqtb}(\\{every\_par\_loc}) calls
\\{show\_token\_list} with the limit~$l=32$. According to \sec292, we want
the token list to contain a token that prints as many characters as possible
when $\\{tally}=31$;
the value of \\{tally} is increased on every call to \\{print\_char} (\sec58).
By studying the cases in \sec294, we conclude that the worst case occurs
when a \\{mac\_param} is printed, and when the character~$c$ actually
prints as three characters. The statement `\\{print\_esc}(|"ETC."|)' in \sec292
will print seven additional
characters if the current \\{escape\_char} is another tripler.  (Longer
examples are possible only if \TeX\ has a bug that tweaks one of the
outputs `|\CLOBBERED.|' or `|\BAD.|' in \sec293; but this can't happen.)

In other words, a worst-case example such as
\begintt
\escapechar=`\↑↑M \catcode`\↑↑I=6
\everypar{1234567890123456789012345678901↑↑Ietc.}
\endtt
in connection with the suggested test line will print
\begintt
{restoring ↑↑Meverypar=1234567890123456789012345678901↑↑I↑↑I↑↑METC.}
\endtt
thereby proving that 44 characters can be printed by
\\{show\_eqtb}(\\{every\_par\_loc}).

\prob 15. Here we must look at the \\{get\_next} procedure, which scans
the \\{buffer} in strange ways when two identical characters of
category~7 (\\{sup\_mark}) are found. After the |\catcode| of open-quote
has been set to~7, \\{get\_next} begins to scan a control sequence
in \sec354, which goes to \sec355 and finds a space after |``|. Since
a space is code \O{40}, it is changed to \O{140}, and the buffer
contents are shifted left~2. By strange coincidence, \O{140} is again
an open-quote character, so we get back to \sec355, which changes
|``(| to |h| and goes back to \\{start\_cs} a third time. Now we go
to \sec356 and then back to \sec355 and \\{start\_cs}, having changed
|``)| to |i|. The fourth round, similarly, changes |```| to a blank
space, and the fifth round finishes the control sequence.

If we try to input the stated line, |INITEX| will come to a halt as follows:
\begintt
! Undefined control sequence.
<*> \catcode``=7 \hi
                     !\error
\endtt
This proves that the \\{buffer} now says |\hi !|.

\prob 16. The error message in question is
\begintt
! Undefined control sequence.
<*> \endlinechar=`! \error
                          ↑↑M
\endtt
and our job is to explain the appearance of |↑↑M|. The standard |\endlinechar|
is \\{carriage\_return}, according to \sec240; this is \O{15} according
to \sec22, and \O{15} is |↑↑M| in ASCII code.
Thus, a \\{carriage\_return} is normally placed at the end of
each line when it's read into the \\{buffer} (see \sec360).
This \\{carriage\_return}
is not usually printed in an error message, because it equals the
\\{end\_line\_char} (see \sec318). We see it now because \\{end\_line\_char}
has changed.

Incidentally, if the input line had been
\begintt
\endlinechar=`!\error
\endtt
(without the space after the |!|), we wouldn't have seen the |↑↑M|. Why not?
Because \TeX\ calls \\{get\_next} when looking for the optional space
after the ASCII constant |`!| (see \sec442--443),
hence the undefined control sequence |\error|
is encountered before \\{end\_line\_char} has been changed!

\prob 17. One problem is to figure out which control sequence is undefined;
it seems to be the `|?|', since this character has been made active.
One clue is to observe from \sec312 and \sec314 that `|<recently read>|'
can be printed only when $\\{base\_ptr}=\\{input\_ptr}$, $\\{state}=
\\{token\_list}$, $\\{token\_type}=\\{backed\_up}$, and $\\{loc}=\\{null}$.
A token list of type \\{backed\_up} usually contains only a single item;
in that case, the control sequence name must be `|How did this happen?|',
and we have a problem getting an active character into a control sequence name.

But an arbitrarily long token list of type \\{backed\_up} can be created with the
|\lowercase| operation (see \sec1288). In that case, however, the right brace
that closes |\lowercase| is almost always still present in \TeX's input
state, and it would show up on the error message. (The \\{back\_list}
procedure of \sec323 does not clear a completed token list off of the stack.)
We have to make \TeX\ clear off its stack before the |}| is scanned.

At this point the exercise begins to resemble ``retrograde chess'' problems.
Here is one solution; since it requires a very long input line, it has been
broken into a three-line answer:
\begintt
\def\answer{\let~\expandafter\lccode`!=`H% [line has been broken]
 ~\lowercase~{~!~o~w~ ~d~i~d~ ~t~h~i~s~  % [line has been broken]
 ~h~a~p~p~e~n~?}}
\endtt
(The `|H|' is a lowercase `|!|';
a chain of\/ |\expandafter|'s is used to make the right brace disappear from
the stack.)

Another approach uses |\csname|, and manufactures a |?| from a |!|:
\begintt
\def\answer{\def\a##1{{\global\let##1?\aftergroup##1}}% [broken]
 \escapechar`H\lccode`!|/`?                             % [broken]
 \lowercase{\expandafter\a\csname ow did this happen!\endcsname}}
\endtt

But there is a (devious) one-line solution, which makes the
invisible \\{carriage\_return} following |\answer| into a right brace:
\begintt
\def\answer{\catcode13=2\lccode`!=H\lowercase\bgroup!ow did this happen?}
\endtt

\prob 18. (The answer to this problem was much more difficult to explain
in class than I had thought it would be, so I guess it was also much
more difficult for the students to solve than I had thought it would be.
After my first attempt to explain the answer,
I decided to make up a special version of \TeX\ that would help to
clarify the scanning routines. This special program, called Demo\TeX,
is just like ordinary
\TeX\ except that if\/ |\tracingstats>2| the user is able to watch \TeX's
syntax routines in slow motion. The changes that convert \TeX\ to
Demo\TeX\ are explained in the appendix below.
Given Demo\TeX, we tried a lot of simple
examples of things like `|\hfuzz=1.5pt|' and `|\catcode`a=11|' before
plunging into exercise~18 in which everything happens at once.
While we were discussing input stacks, by the way, we found it
helpful to consider the behavior of \TeX\ on the following input:
\begintt
\output{\botmark}
\def\a{\error}
\mark{
 \everyvbox{
  \everypar{
   \everydisplay{
    \everyhbox{
     \everymath{\noexpand\a}
     $\relax}
    \hbox\bgroup\relax}
   $$\relax}
  \noindent\relax}
 \vbox\bgroup\relax}
\hbox{}\vfill\penalty-10000
\endtt
Here |\penalty| triggers |\botmark|, which defines |\everyvbox| and begins
a |\vbox|, which defines |\everypar| and begins a |\par|, which defines
|\everydisplay| and begins a |\display|, etc.)

The first line is essentially `|\gdef\a#1d#2#3{#2}|', where the second `|d|'
has catcode~12 (\\{other\_char}). Hence~the second |d| will match a~|d| that
is generated by |\romannumeral|. In this line, \\{scan\_int} is called only
to scan the |`d| and the |12|.

The second line calls \\{scan\_dimen} in order to evaluate the right-hand
side of the assignment to |\hfuzz|. After \\{scan\_dimen} has used
\\{scan\_int} to read the `|100|', it calls \\{scan\_keyword} in order to
figure out the units. But before the units are known to be `|pt|' or `|pc|',
an |\ifdim| must be expanded. Here we need to call \\{scan\_dimen}
recursively, twice; it finds the value 12$\,$pt on the left-hand side,
and is interrupted again while \\{scan\_keyword} is trying to figure out
the units on the right-hand side. Now a chain of\/ |\expandafter|'s causes
|\romannumeral888| to be expanded into |dccclxxxviii|, and then we have
to parse |\a dccclxxxviii|. Here |#1| will be |\else|, |#2| and |#3| will
each be |c|; the expansion therefore reduces to |cclxxxviii\relax\fi|.
The first `|c|' completes the second `|Pc|', and the |\ifdim| test is true.
Therefore the second `|c|' can complete the first `|Pc|', and |\hfuzz|
is set equal to 1200$\,$pt. The characters |lxxxviii| now begin a
paragraph. The |\fi| takes the |\ifdim| out of \TeX's condition stack.

(The appendix below gives further information.
Examples like this give some glimmering of the weird maneuvers that can be
found in the |TRIP| test, an intricate pattern of unlikely code that is
used to validate all implementations of \TeX.)

\prob 19. If, for example, |\thickmuskip| has the value |5mu plus 5mu|
that plain \TeX\ gives it, the first command changes its value to
|-5mu plus -5mu|, because \\{scan\_glue} in \sec461 will call
\\{scan\_something\_internal} with the second argument \\{true}; this
will cause all three components of the glue to be negated (see \sec431).

The second command, on the other hand, tells \TeX\ to expand `|\the\thickmuskip|'
into a sequence of characters, so it is equivalent to
\begintt
\thickmuskip=-5mu plus 5mu
\endtt
(The minus sign doesn't carry into the stretch component of glue, since
\sec461 applies \\{negate} only to the first dimension found.)

This problem points out a well-known danger that is present in any
text-macro-expanding system.

\prob 20. We'd have a funny result that two macro texts would be considered
to match by |\ifx| unless the first one (the one starting at~$q$ when we
begin \sec508) is a proper prefix of the second. (Notice the statement
`$p\K\\{null}$' inside the {\bf while} loop.)

\prob 21. Because the byte in \\{dvi\_buf}[$\\{dvi\_ptr}-1$] is usually not
an operation code, and it just might happen to equal \\{push}.

\prob 22. $2_y\,7_d\,1_d\,8_z\,2_y\,8_z\,1_d\,8_z\,2_y\,8_z\,4_y\,5_z\,
 9_d\,0_d\,4_y\,5_z$.

\prob 23. \TeX\ is in `no mode' only while processing |\write| statements,
and the mode is printed during |\write| only when
$\\{tracing\_commands}>1$ during \\{expand}.  We might think that
|\catcode| operations are necessary, so that the left and right
braces for |\write| exist; but it's possible to let \TeX's error-recovery
mechanism supply them! Therefore the shortest program that meets the
requirements is probably the following one based on an idea
due to Ronaldo Am\'a, who suggests putting
\begintt
\batchmode\tracingcommands2\immediate\write!\nomode
\endtt
into a file. (Seven tokens total.)

\prob 24. When \\{error} calls \\{get\_token}, because the user has asked
for tokens to be deleted (see \sec88), a second level of \\{error} is
possible, but further deletions are disallowed (see \sec336 and \sec346).
However, insertions are still allowed, and this can lead to a third
level of \\{error} when \\{overflow} calls \\{succumb}.

For example, let's assume that $\\{max\_in\_open}=6$. Then you can type
`|\catcode`?=15 \x|' and respond to the undefined control sequence error
by saying `|i\x??|' six times. This leads to a call of \\{error} in which
six `|<insert>|' levels appear; hence $\\{in\_open}=6$, and one more insertion
will be the last straw. At this point, type `|1|'; this enters \\{error}
at a second level, from which `|i|' will enter \\{error} a third time.
(The run-time stack now has \\{main\_control} calling \\{get\_x\_token}
calling \\{expand} calling \\{error} calling \\{get\_token} calling
\\{get\_next} calling \\{error} calling \\{begin\_file\_reading} calling
\\{overflow} calling \\{error}.)

\prob 25. In \sec38, define \\{str\_number} to be the same as \\{pool\_pointer},
and define $\\{str\_end}=128$.
In \sec39, delete the declaration of \\{str\_start}.
In \sec40, declare
\Y\P\4\&{function}\1\  \37$\\{length}(\|s:\\{str\_number})$: \37\\{integer};\6
\4\&{var} \37\|t: \37\\{pool\_pointer};\2\6
\&{begin} \37$\|t\K\|s$;\6
\&{while} $\\{str\_pool}[\|t]\I\\{str\_end}$ \1\&{do}\5
$\\{incr}(\|t)$;\2\6
$\\{length}\K\|t-\|s$;\6
\&{end};\par
\Y\Q
\noindent
In \sec41, define $\\{cur\_length}\S(\\{pool\_ptr}-\\{str\_ptr})$. In %
\sec43, declare
\Y\P\4\&{function}\1\  \37\\{make\_string}: \37\\{str\_number};\C{current
string enters the pool}\6
\4\&{var} \37\|t: \37\\{str\_number};\C{the result}\2\6
\&{begin} \37$\\{str\_room}(1)$;\5
$\\{append}(\\{str\_end})$;\6
$\|t\K\\{str\_ptr}$;\5
$\\{str\_ptr}\K\\{pool\_ptr}$;\5
$\\{make\_string}\K\|t$;\6
\&{end};\par
\Y\Q
\noindent
In \sec44, we can
\Y\P\D \37$\\{flush\_string}\S$\ \&{begin} \37\1\&{repeat} \37$\\{decr}(\\{str%
\_ptr})$;\6
\4\&{until}\5
$\\{str\_pool}[\\{str\_ptr}-1]=\\{str\_end}$;\2\6
$\\{pool\_ptr}\K\\{str\_ptr}$;\6
\&{end}\par
\Y\Q

The comparison function in \sec45 is used only in \sec259, where we can
replace `\&{if} $\\{length}(\\{text}(\|p))=\|l$ \&{then}   \&{if} $\\{str\_eq%
\_buf}(\\{text}(\|p),\|j)$' by
`\&{if} $\\{str\_eq\_buf}(\\{text}(\|p),\|j,\|l)$'. The function now has
three parameters:
\Y\P\4\&{function}\1\  \37$\\{str\_eq\_buf}(\|s:\\{str\_number};\,\35\|k,\39%
\|l:\\{integer})$: \37\\{boolean};\C{test equality of strings}\6
\4\&{label} \37\\{exit};\6
\4\&{var} \37\|j: \37\\{pool\_pointer};\C{running index}\2\6
\&{begin} \37$\|j\K\|s$;\5
$\|s\K\|s+\|l$;\6
\&{if} $\\{str\_pool}[\|s]\I\\{str\_end}$ \1\&{then}\5
$\\{str\_eq\_buf}\K\\{false}$\6
\4\&{else} \&{begin} \37\&{while} $\|j<\|s$ \1\&{do}\6
\&{begin} \37\&{if} $\\{str\_pool}[\|j]\I\\{buffer}[\|k]$ \1\&{then}\6
\&{begin} \37$\\{str\_eq\_buf}\K\\{false}$;\5
\\{return};\5
\&{end};\2\6
$\\{incr}(\|j)$;\5
$\\{incr}(\|k)$;\6
\&{end};\2\6
$\\{str\_eq\_buf}\K\\{true}$;\6
\&{end};\2\6
\4\\{exit}: \37\&{end};\par
\Y\Q
\noindent
The procedure of \sec46 is modified in an obvious, similar way.

The first three statements of \sec47 become just two:
`$\\{pool\_ptr}\K128$; $\\{str\_ptr}\K128$'. The body of the {\bf for} loop in %
\sec48
becomes just
\Y\P\&{if} $(\X49:Character \|k cannot be printed\X)$ \1\&{then}\6
\&{if} $\|k<\O{100}$ \1\&{then}\5
$\\{str\_pool}[\|k]\K\|k+\O{100}$\6
\4\&{else} $\\{str\_pool}[\|k]\K\|k-\O{100}$\2\6
\4\&{else} $\\{str\_pool}[\|k]\K\|k$\2\par
\Y\Q

In \sec59, variable $j$ is no longer needed. If $0\L\|s<128$ and if $s$
isn't the current new-line character, we now say
\Y\P\&{begin} \37\&{if} $\\{str\_pool}[\|s]\I\|s$ \1\&{then}\6
\&{begin} \37$\\{print\_char}(\.{"\↑"})$;\5
$\\{print\_char}(\.{"\↑"})$;\6
\&{end};\2\6
$\\{print\_char}(\\{str\_pool}[\|s])$;\6
\&{end}\par
\Y\Q
\noindent
In the other case, where $\|s\G128$, we say
\Y\P\&{while} $\\{str\_pool}[\|s]\I\\{str\_end}$ \1\&{do}\6
\&{begin} \37$\\{print\_char}(\\{str\_pool}[\|s])$;\5
$\\{incr}(\|s)$;\6
\&{end}\2\par
\Y\Q
\noindent
In \sec407, similarly, variable $k$ is eliminated; the loop on~$k$
becomes a loop on~$s$,  \&{while} $\\{str\_pool}[\|s]\I\\{str\_end}$.

In \sec464, replace the two occurrences of `$\\{str\_start}[\\{str\_ptr}]$' by
`\\{str\_ptr}'.

The first loop in \sec603 becomes
\Y\P$\|k\K\\{font\_area}[\|f]$;\6
\&{while} $\\{str\_pool}[\|k]\I\\{str\_end}$ \1\&{do}\6
\&{begin} \37$\\{dvi\_out}(\\{str\_pool}[\|k])$;\5
$\\{incr}(\|k)$;\6
\&{end}\2\par
\Y\Q
\noindent and the second is like unto it.

\prob 26. Let's assume that we have a machine in which \\{str\_pool} is
addressed by byte number, so that 8-bit values take no more space than
7-bit values. Method~(a) requires us to impose a limit on the length
of strings: 255 characters max. This isn't unreasonable, because the only
important use of longer strings is in the implementation of\/ |\special|,
when the restriction doesn't actually apply (since \sec1368 doesn't
call \\{make\_string}). But method~(a) saves no space and little or no time
by comparison with the simpler method of problem~25. Problem~25 saves
about one byte per string, compared to the text's way. Method~(b) saves
another byte per string but at the expense of considerable programming
complexity; it requires awkward special-casing to deal with empty strings.

\prob 27. We'd replace `\\{width}$(g)$' by `$\\{width}(g)+
\\{shift\_amount}(g)$' (twice). Similar changes would be needed in \sec656.
(But a box shouldn't be able to retain its \\{shift\_amount}; this quantity
is a property of the list the box is in, not a property of the box itself.)

\prob 28. The final line has infinite stretchability, since plain \TeX\
sets |\parfillskip=0pt plus 1fil|. Reports of loose, tight, underfull,
or overfull boxes are never made unless $o=\\{normal}$ in \sec658 and \sec664.

\prob 29. If a vbox is repackaged as an hbox, we get really weird results
because things that were supposed to stack up vertically are placed together
horizontally. The second change would be a lot less visible, except in
characters like $V$ where there is a large italic correction; the character
would be centered without taking its italic correction into account.
(The italic correction in math mode is the difference between horizontal
placement of superscripts and subscripts in formulas like $V_2↑2$.)

\prob 30. The spacing can be found by saying
\begintt
$x==1$ $x++1$ $x,,1$ \tracingall\showlists.
\endtt
Most of the decisions are made in \sec766, using the spacing table of\/
\sec764. But the situation is trickier in the case of~|+|, because a
\\{bin\_noad} must be preceded and followed by a noad of a suitable class.
In the formula |$x++1$|, the second~|+| is changed from \\{bin\_noad} to
\\{ord\_noad} in \sec728. It turns out that thick spaces are inserted after the~$x$
and before the~1 in `$x==1$'; medium spaces are inserted before each~$+$ sign in
`$x++1$'; thin spaces are inserted after each comma in `$x,,1$'.

\prob 31. The behavior of the simpler algorithm, which we may call Brand~X,
can be deduced from the demerits values (`|d=|') in the trace output.
There is only one reasonable choice, |@@1|, for the first line; and there's
only one, |@@2|, for the second. But for the third, a line from |@@2| to
|@@3| (the break after `|para-|') has 46725 demerits, which certainly looks
worse than the 1225 demerits from |@@2| to |@@4|. This, however, leads Brand~X
into a trap, since there's no good way to continue from |@@4|. Similarly,
Brand~X will choose to go from |@@7| to |@@9|, and this forces it to
|@@11| and then infelicitously to |@@13| (because the syllable `|break-|'
is too long to be squeezed in). The resulting paragraph, as typeset by
Brand~X, looks like this (awful):
$$\vbox{\hbadness=2000
\hsize=3.18in
\pretolerance=-1
\prob 31. When your instructor made up this problem, he said
`|\tracingparagraphs=1|' so that his transcript file would explain why
\TeX\ has broken the paragraph\break into lines in a particular way. He also said\break
`|\pretolerance=-1|' so that hyphenation would be tried immediately.
The output is shown on the next page; use it to determine
what line breaks would have\break been found by a simpler algorithm
that breaks one line at a time. (The simpler algorithm finds the
breakpoint that yields fewest demerits on the first line,
then chooses it and starts over again.)}$$

\prob 32. (This exercise takes awhile, but the data structures are especially
interesting; the hyphenation algorithm is a nice little part of the program
that can be studied in isolation.) The following tables are constructed:
$$\setbox0=\hbox{]} \def\]{\kern-\wd0}
\vbox{\halign{&\kern20pt\hfil#\quad&\hfil#\quad&\hfil#\quad&\hfil#\qquad\cr
&\\{op}\]&\\{char}\]&\\{link}\]&&[1]\]&[2]\]&[3]\]\cr
\noalign{\vskip2pt}
\\{trie}[96]&0&96&1&\\{hyf\_distance}&2&0&3\cr
\\{trie}[97]&0&97&5&\\{hyf\_num}&1&3&2\cr
\\{trie}[98]&0&97&2&\\{hyf\_next}&0&0&2\cr
\\{trie}[100]&1&98&3\cr
\\{trie}[102]&1&99&4\cr
\\{trie}[103]&0&98&6\cr
\\{trie}[105]&3&99&4\cr}}$$
Given the word |aabcd|, it is interesting to watch \sec923 produce the
hyphenation numbers
`{\def\\#1{$_{\kern\scriptspace#1}$}\tt\\0a\\0a\\2b\\1c\\0d\\3}' from this trie.

\prob 33. The idea is to keep line numbers on the save stack. Scott Douglass
has observed that, although \TeX\ is careful to keep \\{cur\_boundary} up to date,
nothing important is ever done with it; hence the \\{save\_index} field in
level-boundary words is not needed, and we have an extra halfword to play with!
(The present data structure has fossilized elements left over from old
incarnations of \TeX.)
However, line numbers might get larger than a halfword; it seems better
to store them as fullword integers.

This problem requires changes to three parts of the program. First, we can
extend \sec1063 as follows:
\Y\P$\4\X1056:Cases of \\{main\_control} that build boxes and lists\X%
\mathrel{+}\S$\6
\4$\\{non\_math}(\\{left\_brace})$: \37\&{begin} \37$\\{saved}(0)\K\\{line}$;\5
$\\{incr}(\\{save\_ptr})$;\5
$\\{new\_save\_level}(\\{simple\_group})$;\6
\&{end};\C{the line number is saved for possible use in warning message}\6
\4$\\{any\_mode}(\\{begin\_group})$: \37\&{begin} \37$\\{saved}(0)\K\\{line}$;\5
$\\{incr}(\\{save\_ptr})$;\5
$\\{new\_save\_level}(\\{semi\_simple\_group})$;\6
\&{end};\6
\4$\\{any\_mode}(\\{end\_group})$: \37\&{if} $\\{cur\_group}=\\{semi\_simple%
\_group}$ \1\&{then}\6
\&{begin} \37\\{unsave};\5
$\\{decr}(\\{save\_ptr})$;\C{pop unused line number from stack}\6
\&{end}\6
\4\&{else} \\{off\_save};\2\par
\Y\Q\noindent
A similar change is needed in \sec1068, where the first case becomes
\Y\P\4\\{simple\_group}: \37\&{begin} \37\\{unsave};\5
$\\{decr}(\\{save\_ptr})$;\C{pop unused line number from stack}\6
\&{end};\par
\Y\Q

Finally, we replace lines 6--11 of \sec1335 by code for the desired messages:
\Y\P
\&{while} $\\{cur\_level}>\\{level\_one}$ \1\&{do}\6
\&{begin} \37$\\{print\_nl}(\.{"("})$;\5
$\\{print\_esc}(\.{"end\ occurred\ when\ "})$;\6
\&{case} $\\{cur\_group}$ \1\&{of}\6
\4\\{simple\_group}: \37$\\{print\_char}(\.{"\{"})$;\6
\4\\{semi\_simple\_group}: \37$\\{print\_esc}(\.{"begingroup"})$;\6
\4\&{othercases} \37$\\{confusion}(\.{"endgroup"})$\2\6
\&{endcases};\6
$\\{print}(\.{"\ on\ line\ "})$;\5
\\{unsave};\5
$\\{decr}(\\{save\_ptr})$;\5
$\\{print\_int}(\\{saved}(0))$;\5
$\\{print}(\.{"\ was\ incomplete)"})$;\6
\&{end};\2\6
\&{while} $\\{cond\_ptr}\I\\{null}$ \1\&{do}\6
\&{begin} \37$\\{print\_nl}(\.{"("})$;\5
$\\{print\_esc}(\.{"end\ occurred\ when\ "})$;\5
$\\{print\_cmd\_chr}(\\{if\_test},\39\\{cur\_if})$;\par
\Y\Q

\def\TeXX{\TeX\kern-.3emX}
\prob 34. First, \sec2 gets a new paragraph explaining what \TeXX\ is, and the
banner line changes:
\Y\P\D \37$\\{banner}\S\.{\'This\ is\ TeXX,\ Version\ 2.2\'}$\C{printed when %
\TeX\ starts}\par\Y\Q\noindent
Then we add two new definitions in \sec134:
\Y\P\D \37$\\{is\_xchar\_node}(\#)\S(\\{font}(\#)=\\{font\_base})$\C{is this %
\\{char\_node} extended?}\par
\P\D \37$\\{bypass\_xchar}(\#)\S$\1\6
\&{if} $\\{is\_xchar\_node}(\#)$ \1\&{then}\5
$\#\K\\{link}(\#)$\2\2\par
\Y\Q\noindent (It's necessary to say \\{font\_base} here instead of
\\{null\_font}, because \\{null\_font} isn't defined until later.)

The \\{short\_display} routine of \sec174 can treat an |\xchar| like
an ordinary character, because \\{print\_ASCII} makes no restrictions. Here is
one way to handle the change:
\Y\P\4\&{procedure}\1\  \37$\\{short\_display}(\|p:\\{integer})$;\C{prints
highlights of list \|p}\6
\4\&{label} \37\\{done};\6
\4\&{var} \37\|n: \37\\{integer};\C{for replacement counts}\6
\\{ext}: \37\\{integer};\C{amount added to character code by xchar}\2\6
\&{begin} \37$\\{ext}\K0$;\6
\&{while} $\|p>\\{mem\_min}$ \1\&{do}\6
\&{begin} \37\&{if} $\\{is\_char\_node}(\|p)$ \1\&{then}\6
\&{begin} \37\&{if} $\|p\L\\{mem\_end}$ \1\&{then}\6
\&{begin} \37\&{if} $\\{is\_xchar\_node}(\|p)$ \1\&{then}\6
\&{begin} \37$\\{ext}\K256\ast(\\{qo}(\\{character}(\|p)))$;\5
\&{goto} \37\\{done};\6
\&{end};\2\6
\&{if} $\\{font}(\|p)\I\\{font\_in\_short\_display}$ \1\&{then}\6
\&{begin} \37\&{if} $(\\{font}(\|p)<\\{font\_base})\V(\\{font}(\|p)>\\{font%
\_max})$ \1\&{then}\5
$\\{print\_char}(\.{"*"})$\6
\4\&{else} \X267:Print the font identifier for $\\{font}(\|p)$\X;\2\6
$\\{print\_char}(\.{"\ "})$;\5
$\\{font\_in\_short\_display}\K\\{font}(\|p)$;\6
\&{end};\2\6
$\\{print\_ASCII}(\\{ext}+\\{qo}(\\{character}(\|p)))$;\5
$\\{ext}\K0$;\6
\&{end};\2\6
\&{end}\6
\4\&{else} \X175:Print a short indication of the contents of node \|p\X;\2\6
\4\\{done}: \37$\|p\K\\{link}(\|p)$;\6
\&{end};\2\6
\&{end};\par
\Y\Q\noindent
A somewhat similar change applies in \sec176:
\Y\P\4\&{procedure}\1\  \37$\\{print\_font\_and\_char}(\|p:\\{integer})$;%
\C{prints \\{char\_node} data}\6
\4\&{label} \37\\{reswitch};\6
\4\&{var} \37\\{ext}: \37\\{integer};\C{amount added to character code by
xchar, or $-1$}\2\6
\&{begin} \37$\\{ext}\K-1$;\6
\4\\{reswitch}: \37\&{if} $\|p>\\{mem\_end}$ \1\&{then}\5
$\\{print\_esc}(\.{"CLOBBERED."})$\6
\4\&{else} \&{begin} \37\&{if} $\\{is\_xchar\_node}(\|p)$ \1\&{then}\6
\&{begin} \37$\\{ext}\K\\{qo}(\\{character}(\|p))$;\5
$\|p\K\\{link}(\|p)$;\5
\&{goto} \37\\{reswitch};\5
\&{end};\2\6
\&{if} $(\\{font}(\|p)<\\{font\_base})\V(\\{font}(\|p)>\\{font\_max})$ \1%
\&{then}\5
$\\{print\_char}(\.{"*"})$\6
\4\&{else} \X267:Print the font identifier for $\\{font}(\|p)$\X;\2\6
$\\{print\_char}(\.{"\ "})$;\6
\&{if} $\\{ext}<0$ \1\&{then}\5
$\\{print\_ASCII}(\\{qo}(\\{character}(\|p)))$\6
\4\&{else} \&{begin} \37$\\{print\_esc}(\.{"xchar"})$;\5
$\\{print\_hex}(\\{ext}\ast256+\\{qo}(\\{character}(\|p)))$;\6
\&{end};\2\6
\&{end};\2\6
\&{end};\par
\Y\Q\noindent
(These routines must be extra-robust.)
The first line of code in \sec183 now becomes
\Y\P\&{if} $\\{is\_char\_node}(\|p)$ \1\&{then}\6
\&{begin} \37$\\{print\_font\_and\_char}(\|p)$;\5
$\\{bypass\_xchar}(\|p)$;\6
\&{end}\par\Y\Q

In \sec208 we introduce a new operation code,
\Y\P\D \37$\\{xchar\_num}=17$\C{extended character ( \.{\\xchar} )}\par
\Y\Q\noindent
Every opcode that follows it
in \sec208 and \sec209, from \\{math\_char\_num} to
\\{max\_command}, must be increased by~1.
We also add the following lines to \sec265 and \sec266, respectively:
\Y\P$\\{primitive}(\.{"xchar"},\39\\{xchar\_num},\390)$;\par
\P\4\\{xchar\_num}: \37$\\{print\_esc}(\.{"xchar"})$;\par
\Y\Q\noindent
This puts the new command into \TeXX's repertoire.

The next thing we need to worry about is what to do when |\xchar| occurs
in the input.
It's convenient to add a companion procedure to \\{scan\_char\_num} in \sec435:
\Y\P\4\&{procedure}\1\  \37\\{scan\_xchar\_num};\2\6
\&{begin} \37\\{scan\_int};\6
\&{if} $(\\{cur\_val}<0)\V(\\{cur\_val}>65535)$ \1\&{then}\6
\&{begin} \37$\\{print\_err}(\.{"Bad\ character\ code"})$;\5
$\\{help2}(\.{"An\ \\xchar\ number\ must\ be\ between\ 0\ and\ 255."})$\6
$(\.{"I\ changed\ this\ one\ to\ zero."})$;\5
$\\{int\_error}(\\{cur\_val})$;\5
$\\{cur\_val}\K0$;\6
\&{end};\2\6
\&{end};\par\Y\Q\noindent
Similarly, \\{new\_character} gets a companion in \sec582:
\Y\P\4\&{function}\1\  \37$\\{new\_xchar}(\|f:\\{internal\_font\_number};\,\35\|c:%
\\{integer})$: \37\\{pointer};\6
\4\&{var} \37$\|p,\39\|q$: \37\\{pointer};\C{newly allocated nodes}\2\6
\&{begin} \37$\|q\K\\{new\_character}(\|f,\39\|c\mathbin{\&{mod}}256)$;\6
\&{if} $\|q=\\{null}$ \1\&{then}\5
$\\{new\_xchar}\K\\{null}$\6
\4\&{else} \&{begin} \37$\|p\K\\{get\_avail}$;\5
$\\{font}(\|p)\K\\{font\_base}$;\5
$\\{character}(\|p)\K\\{qi}((\|c\mathbin{\&{div}}256))$;\5
$\\{link}(\|p)\K\|q$;\5
$\\{new\_xchar}\K\|p$;\6
\&{end};\2\6
\&{end};\par
\Y\Q

\goodbreak
Extended characters can be output properly if we replace the opening lines
of the code in \sec620 by these:
\Y\P\4\\{reswitch}: \37\&{if} $\\{is\_char\_node}(\|p)$ \1\&{then}\6
\&{begin} \37\\{synch\_h};\5
\\{synch\_v};\6
\1\&{repeat} \37\&{if} $\\{is\_xchar\_node}(\|p)$ \1\&{then}\6
\&{begin} \37$\|f\K\\{font}(\\{link}(\|p))$;\6
\&{if} $\\{character}(\|p)=\\{qi}(0)$ \1\&{then}\5
$\|p\K\\{link}(\|p)$;\C{bypass zero extension}\2\6
\&{end}\6
\4\&{else} $\|f\K\\{font}(\|p)$;\2\6
$\|c\K\\{character}(\|p)$;\6
\&{if} $\|f\I\\{dvi\_f}$ \1\&{then}\5
\X621:Change font \\{dvi\_f} to \|f\X;\2\6
\&{if} $\\{is\_xchar\_node}(\|p)$ \1\&{then}\6
\&{begin} \37$\\{dvi\_out}(\\{set1}+1)$;\5
$\\{dvi\_out}(\\{qo}(\|c))$;\5
$\|p\K\\{link}(\|p)$;\5
$\|c\K\\{character}(\|p)$;\6
\&{end}\6
\4\&{else} \&{if} $\|c\G\\{qi}(128)$ \1\&{then}\5
$\\{dvi\_out}(\\{set1})$;\2\2\6
$\\{dvi\_out}(\\{qo}(\|c))$;\par
\Y\Q

Many of the processing routines include a statement of the form `$f\K\\{font}(\#)$',
which we want to do only after bypassing the first half of an extended character.
This can be done by inserting the following statements:
$$\advance\baselineskip2pt
\vbox{\halign{\\{bypass\_xchar}(#) \hfil&in \sec#\hfil\cr
$p$&654;\cr
$s$&842;\cr
\\{cur\_p}&867;\cr
$s$&871;\cr
$p$&1147.\cr}}$$
In \sec841 we need to do a little more than a simple bypass:
\Y\P\&{if} $\\{is\_char\_node}(\|v)$ \1\&{then}\6
\&{begin} \37\&{if} $\\{is\_xchar\_node}(\|v)$ \1\&{then}\6
\&{begin} \37$\|v\K\\{link}(\|v)$;\5
$\\{decr}(\|t)$;\C{an xchar counts as two chars}\6
\&{end};\par
\Y\Q

Two changes are needed in order to suppress hyphenation in words that contain
extended characters. First we insert
\Y\P\&{if} $\\{hf}=\\{font\_base}$ \1\&{then}\5
\&{goto} \37\\{done1};\C{$\\{is\_xchar\_node}(\|s)$}\par\Y\Q
\noindent after the third line of \sec896. Then we replace `{\bf endcases};'
in \sec899 by
\Y\P\&{endcases}\6
\4\&{else} \&{if} $\\{is\_xchar\_node}(\|s)$ \1\&{then}\5
\&{goto} \37\\{done1};\par\Y\Q

If\/ |\xchar| appears in math mode, we want to recover from the error by
including $\\{mmode}+\\{xchar\_num}$ in the list of cases in \sec1046.
If\/ |\xchar| appears in vertical mode, we want to begin a paragraph by
including $\\{vmode}+\\{xchar\_num}$ in the second list of cases in \sec1090.
\Y

But what it |\xchar| appears in horizontal mode? To handle this, we might
as well rewrite \sec1122:

\M1122. We need only two more things to complete the horizontal mode
routines, namely
the \.{\\xchar} and \.{\\accent} primitives.

\Y\P$\4\X1056:Cases of \\{main\_control} that build boxes and lists\X%
\mathrel{+}\S$\6
\4$\\{hmode}+\\{xchar\_num}$: \37\&{begin} \37\\{scan\_xchar\_num};\5
$\\{link}(\\{tail})\K\\{new\_xchar}(\\{cur\_font},\39\\{cur\_val})$;\6
\&{if} $\\{link}(\\{tail})\I\\{null}$ \1\&{then}\5
$\\{tail}\K\\{link}(\\{link}(\\{tail}))$;\2\6
$\\{space\_factor}\K1000$;\6
\&{end};\6
\4$\\{hmode}+\\{accent}$: \37\\{make\_accent};\par
\fi

\goodbreak
\def\ellipsis{\kern50pt\smash{\vdots}\qquad\vbox to12pt{}\par}
\Y\Q Finally, we need to extend \\{make\_accent} so that extended characters
can be accented. (Problem~34 didn't call for this explicitly, but \TeXX\
should surely do it.) This means adding a new case in \sec1124:
\Y\P\4\&{else} \&{if} $\\{cur\_cmd}=\\{xchar\_num}$ \1\&{then}\6
\&{begin} \37\\{scan\_xchar\_num};\5
$\|q\K\\{new\_xchar}(\|f,\39\\{cur\_val})$;\6
\&{end}\par
\Y\Q\noindent
and making changes at the beginning and end of \sec1125:
\Y\P$\4\X1125\*:Append the accent with appropriate kerns, then set $\|p\K\|q$\X%
\S$\6
\&{begin} \37$\|t\K\\{slant}(\|f)/\\{float\_constant}(65536)$;\6
\&{if} $\\{is\_xchar\_node}(\|q)$ \1\&{then}\5
$\|i\K\\{char\_info}(\|f)(\\{character}(\\{link}(\|q)))$\6
\4\&{else} $\|i\K\\{char\_info}(\|f)(\\{character}(\|q))$;\2\6
$\|w\K\\{char\_width}(\|f)(\|i)$;\6
\ellipsis\6
$\\{subtype}(\\{tail})\K\\{acc\_kern}$;\5
$\\{link}(\|p)\K\\{tail}$;\6
\&{if} $\\{is\_xchar\_node}(\|q)$ \1\&{then}\C{in this case we want to bypass
the xchar part}\6
\&{begin} \37$\\{tail\_append}(\|q)$;\5
$\|p\K\\{link}(\|q)$;\6
\&{end}\6
\4\&{else} $\|p\K\|q$;\2\6
\&{end}\par
\Q

\prob 35. The main reason for preferring the method of problem 34 is that
the italic correction operation (\sec1113) would be extremely difficult with
the other scheme. Other advantages are: (a)~Division by 256 is needed only once;
\TeXX's main loops remain fast. (b)~Comparatively few changes from \TeX\
itself are needed, hence other ripoffs of \TeX\ can easily incorporate the
same ideas. (c)~Since fonts don't need to be segregated into `oriental'
and `occidental', |\xchar| has wide applicability. For example, it gives
users a way to suppress ligatures and kerns; it allows large fonts to have
efficient
256-character subsets of commonly-used characters. (d)~The conventions of
\TeXX\ match those of the |GF| files produced by {\logo METAFONT}.

The only disadvantage of the \TeXX\ method is that it requires all characters
whose codes differ by multiples of~256 to have the same box size. But this
is a minor consideration.

\bigbreak
\noindent{\bf Appendix}
\nobreak\smallskip\noindent
The solution to problem 18 refers to a special version of \TeX\ called
Demo\TeX, which allows users to see more details of the scanning process.
Demo\TeX\ is formed by making a few changes to parts 24--26 of \TeX.

First, in \sec341, the following code is placed between `\\{exit}:'\ and
`{\bf end}':
\Y\P
\37\&{if} $\\{tracing\_stats}>2$ \1\&{then}\6
\&{begin} \37$\|k\K\\{trace\_depth}$;\5
$\\{print\_nl}(\.{""})$;\6
\&{while} $\|k>0$ \1\&{do}\6
\&{begin} \37$\\{print}(\.{"\ "})$;\5
$\\{decr}(\|k)$;\6
\&{end};\2\6
$\\{print}(\.{"\|"})$;\5
$\\{print\_char}(\.{"\ "})$;\6
\&{if} $\\{cur\_cs}>0$ \1\&{then}\6
\&{begin} \37$\\{print\_cs}(\\{cur\_cs})$;\5
$\\{print\_char}(\.{"="})$;\6
\&{end};\2\6
$\\{print\_cmd\_chr}(\\{cur\_cmd},\39\\{cur\_chr})$;\6
\&{end};\par
\Y\Q\noindent
(A new global variable, \\{trace\_depth}, is declared somewhere and initialized
to zero. It is used to indent the output of Demo\TeX\ so that the depth of
subroutine nesting is displayed.)

At the beginning of \\{expand} (in \sec366), we put the statements
\Y\P
$\\{incr}(\\{trace\_depth})$;\6
\&{if} $\\{tracing\_stats}>2$ \1\&{then}\5
$\\{print}(\.{"\ <x"})$;\par
\Y\Q\noindent
this prints `|<x|' when \\{expand} begins to expand something. The same statements
are inserted at the beginning of \\{scan\_int} (\sec400), \\{scan\_dimen}
(\sec448), and \\{scan\_glue} (sec461),
except that \\{scan\_int} prints `|<i|', \\{scan\_dimen}
prints `|<d|', and \\{scan\_glue} prints `|<g|'. (Get it?) We also insert
complementary code at the end of each of these procedures:
\Y\P
$\\{decr}(\\{trace\_depth})$;\6
\&{if} $\\{tracing\_stats}>2$ \1\&{then}\5
$\\{print\_char}(\.{">"})$;\par
\Y\Q\noindent
this makes it clear when each part of the scanner has done its work.

Finally, \\{scan\_keyword} is instrumented in a similar way, but with explicit
information about what keyword it is seeking. The code
\Y\P$\\{incr}(\\{trace\_depth})$;\6
\&{if} $\\{tracing\_stats}>2$ \1\&{then}\6
\&{begin} \37$\\{print}(\.{"\ <\'"})$;\5
$\\{print}(\|s)$;\5
$\\{print\_char}(\.{"\'"})$;\6
\&{end};\par
\Y\Q\noindent
is inserted at the beginning of \sec407, and
\Y\P\&{if} $\\{tracing\_stats}>2$ \1\&{then}\5
$\\{print\_char}(\.{"*"})$;\2\6
\4\\{exit}: \37$\\{decr}(\\{trace\_depth})$;\6
\&{if} $\\{tracing\_stats}>2$ \1\&{then}\5
$\\{print\_char}(\.{">"})$;\2\6
\&{end};\par\Y\Q\noindent
replaces the code at the end. (Here `|*|' denotes `success': the keyword was found.)

For example, here's the beginning
of what Demo\TeX\ prints out when scanning the right-hand
side of the assignment to |\hfuzz| in problem~18:
\begintt
|! the character = <d
 |! the character 1 <i
  |! the character 1
  |! the character 0
  |! the character 0
  |! the letter P>
 |! the letter P <'em'
  |! the letter P> <'ex'
  |! the letter P> <'true'
  |! the letter P> <'pt'
  |! the letter P
  |! \ifdim =\ifdim <x <d
    |! the character 1 <i
     |! the character 1
     |! the character 2
     |! the letter p>
    |! the letter p <'em'
     |! the letter p> <'ex'
     |! the letter p> <'true'
     |! the letter p> <'pt'
     |! the letter p
     |! the letter t*>
    |! the character =>
\endtt
(After seeing `|=|', \TeX\ calls \\{scan\_dimen}. The next character seen is
`|1|'; \\{scan\_dimen} puts it back to be read again and calls \\{scan\_int},
which finds `|100|', etc.
This output demonstrates the fact that
\TeX\ frequently uses \\{back\_input} to reread a character,
when it isn't quite ready to deal with that character.)

\bigbreak
\noindent{\bf Acknowledgement}
\nobreak\smallskip\noindent
I wish to thank the brave students of my experimental class
for motivating me to think of these questions, for sticking with me
when the questions were impossible to understand,
and for making many improvements to my original answers.

\endgroup\bye